1) taking v031 (cd-hit-est of 10 assemblies) then selecting only sequences >20,000 bp.
./cd-hit-est -i /Volumes/Bay4\ scratch/temp/Galaxy56-[Tabular-to-FASTA_on_data_55].fasta -o /Volumes/Bay4\ scratch/temp/CgigasBAC_cdhit -M 2500
total seq: 60
longest and shortest : 203422 and 84264
Total letters: 8610155
Sequences have been sorted
Approximated minimal memory consumption:
Sequence : 8M
Buffer : 1 X 2068M = 2068M
Table : 1 X 16M = 16M
Miscellaneous : 4M
Total : 2098M
Table limit with the given memory limit:
Max number of representatives: 4194304
Max number of word counting entries: 50200207
comparing sequences from 0 to 60
60 finished 53 clusters
Apprixmated maximum memory consumption: 2150M
writing new database
writing clustering information
program completed !
Total CPU time 148